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RAPID MIXING AND MARKOV BASES 


TOBIAS WINDISCH 


Abstract. The mixing behaviour of random walks on lattice points of polytopes 
using Markov bases is examined. It is shown that under a dilation of the underlying 
polytope, these random walks do not mix rapidly when a fixed Markov basis is used. 
We also show that this phenomenon does not disappear after adding more moves to 
the Markov basis. Avoiding rejections by sampling applicable moves does also not 
lead to an asymptotic improvement. As a way out, a method of how to adapt Markov 
bases in order to achieve the fastest mixing behaviour is introduced. 


1. Introduction 

Random walks have been successfully used in various applications to explore combi¬ 
natorial structures where a complete enumeration is computationally prohibitive [17, 
21, 9]. In many of these applications, the underlying discrete objects correspond to 
the elements of a fiber J^A,b '■= {n G : Au = 6} of a matrix A G with 

kerx(A) fl = {0} and a right-hand side b G Z”^. The exploration of hbers with ran¬ 
dom walks requires to connect their elements by edges so that there is a path between 
any two of them. In their groundbreaking work [9], Diaconis and Sturmfels have shown 
how to endow J^A,b with the structure of a connected graph in a computational way: 
For a hnite set Ai C kerz(A), the fiber graph AA,b(A4) is the graph on in which 
two nodes u,v E TA,b are adjacent ii u — v E ±AI. They have coined the term Markov 
basis, which denotes a hnite set Ai C kerz(A) such that AA,b{Ai) is connected for all 
b E Z™. Their main result shows that a Markov basis can be obtained by a Grobner 
basis computation in a polynomial ring [9, Theorem 3.1]. Markov bases can be used to 
enumerate locally the neighborhood of a node in the hber graph, which makes them a 
general machinery to approximate any probability distribution on any hber of a matrix. 
The number of steps needed to approximate a given distribution sufficiently is the mix¬ 
ing time of the random walk. Even though the computation of Markov bases received 
a lot of attention in the last decade [14, 18, 30, 28, 29], mixing results on hber graphs 
are still rare. It was shown in [6] that the mixing time of random walks on two-way 
contingency tables with the same row and column sums using a minimal Markov basis 
is quadratic in the diameter of the underlying hber graph and a similar result is true 
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for random walks on lattice points of polytopes that nse the nnit vectors as Markov 
basis vectors [7, 33]. 

In this paper, we stndy the mixing behavionr of the simple walk on hber graphs, 
whose stationary distribntion is the uni form distribntion on J^A,b- Onr main resnlt 
concerns the mixing behavionr of hber graph seqnences that use a hxed Markov basis: 

Theorem 1.1. Let A G let M. C 'keiz{A) be a Markov basis for A, and let 

a dominated sequence in Mkl. Then is no expander. If additionally 

has a meaningful parametrization, then is not rapidly mixing. 

Surprisingly, walking randomly on a hber with a larger Markov basis (Remark 3.24) 
or avoiding rejections by sampling only applicable moves (Remark 3.27) does not im¬ 
prove the asymptotic mixing behaviour. The conclusion we draw from these results is 
that an adaption of the Markov basis has to take place depending on the right-hand 
side b G Z™. In Section 4, we adapt the Markov basis so that the underlying graph 
becomes the complete graph with additional loops. Adding more moves to a Markov 
basis increases the rejection rate, i.e. the number of loops, on every node of the hber. 
Thus, it is a hne line to hnd the proper number of moves to add without slowing down 
the random walk. The fastest mixing behaviour is obtained for expander graphs [16] 
and we show how to obtain expanders on hbers under mild assumptions on the diam¬ 
eter (Corollary 4.3). The idea of constructing expanders on hber is due to Alexander 
Engstrom who used the zig-zag product to obtain expanders [13]. Our method is diher- 
ent and yields for hxed n G N an expanding family for n x n contingency tables where 
all row and column sums are equal. Our adapted Markov basis can become arbitrarily 
large and a priori it is not easy to draw a move from it uniformly at random. It remains 
an interesting problem - both from a combinatorial and statistical side - to understand 
the structure of the adapted Markov basis and how one can draw from it efficiently. 

Conventions and Notation. The natural numbers are N := {0,1,2,...}. For any 
n G M, we set [n] := {m G N ; 1 < m < n} and we use N>„ and N>„ to denote 
the subsets of N whose elements are strictly greater and greater than n respectively. 
A graph is always undirected and can multiple loops. Let G = {V, E) be a graph. 
If there is d G N such that all the nodes of G are incident to d edges, then G is 
d-regular. The distance dciv^w) of two nodes v,w E V is the number of edges in a 
shortest path connecting v and w. The diameter diam (G) of G is the maximal distance 
that appears between any pair of its nodes. Here, it is assumed that all hber-dehning 
matrices A G fulhll kerz(A) fl = {0}. The affine semigroup in Z'" generated 

by the column vectors of A is denoted by NA. Let (aj)ieN and (6j)jeN be two sequences 
in Q, then (aj)jgN ^ C>(6j)jgN if there exist io ^ and C G Q>o such that |aj| < C ■ \hi\ 
for all i > if). Similarly, we dehne (aj)igN G r2(6i)jgN if there exist fo G N and C G Q>o 
such that \ai\ > G ■ \bi\ for all i > if. The sequence (aj)igN is a subsequence of {bi)i^fq 
if there is a strongly increasing sequence (4)fcGN in hi such that Oi^, = bk for all k eN. 
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2. Markov chains on fiber graphs 


Let G = iV,E) be an nndirected graph with V = {ui,... ,u„}. For Vi,Vj G V, let 
Afj be the number of edges in E with endpoints Vi and vj, then the matrix G N>q"' 
is the adjacency matrix of G. For v E V, let degc>(u) be the number of edges incident 
to V in G. The simple walk on G has transition probabilities 


G 




degchi) ’ 
0 , 


if {vi,Vj} G E 
otherwise 


The simple walk on G comes along with a discrete-time Markov chain whose state 
space is the node set V of the graph [3]. Let ttq G [0,1]” be an initial distribntion 

on V. For any f G N, let vr* := tto • {S^Y G [0,1]”, then nt{i) is the probability 

that the simple walk with initial distribution tto is at Vi at time t. The Markov chain 
on G is aperiodic if for all i G [n], gcd{f G N>o : {S^Ya > 0} = 1) symmetric if 

is symmetric, and irreducible if for all i,j G [n] there exists f G N snch that 

{S^Yij > 0- aperiodic and irreducible Markov chain converges towards a nnique 
stationary distribution vr G [0,1]” [20, Theorem 4.9] and the second largest eigenvalue 
modulus (SLEM) A of S^, is a measurement of the convergence rate [16, Section 3]: 
(llTTt — Vr||)tgN £ 0{XYtm- 


Remark 2.1. If G is d-regular, then and hence is symmetric. If G is 

also connected, then the simple walk is irreducible and converges towards the uniform 
distribntion tt = ■ (1,..., 1)^ G [0, l]l^l on V. 


The closer the second largest eigenvalue of a random walk is to 1, the slower the 
convergence to its stationary distribution. The next dehnition states nnder which 
conditions we still have a polynomial bonnd on the mixing time. 


Definition 2.2. For any f G N, let Gi = {Vi, EYj be a graph and let \i be the second 
largest eigenvalue modulus of The sequence is rapidly mixing if there is a 

polynomial p G Q>o[t] such that for alH G M, A* < 1 — The sequence {Gi)i^n 

is an expander if there exists e > 0 snch that for alH G N, A* < 1 — e. 


Expander graphs are highly demanded in compnter science because of their good 
mixing behaviour. Their name relates to the fact that their edge-expansion (Dehni¬ 
tion 3.6) can strictly bounded away from zero (Proposition 3.8). The mixing time of 
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rapidly mixing Markov chains can be bounded by a polynomial in the logarithm of 
the size of the state space (see [25, Section 2.3] or [3, Section 1.1.2]) and thus only 
logarithmically many nodes have to be visited by the random walk to converge. The 
key player of this paper is the simple walk on the following type of graph; 

Definition 2.3. Let C be finite sets. The fiber graph J^(A4) is the graph on 

T where two nodes u, u G are adjacent ii u — v E ±A1 and where every node in G 
gets a loop for every ra G ±M. that satisfies w + m ^ T. 

Recall that graphs can have multiple loops. The edges incident to a node v E T 
correspond precisely to elements in ±A1 and thus the graph is | ± Al|-regular. 

If 0 G then v — v E ±A1 and thus every node has at least one loop. In order to 
run irreducible Markov chains on fiber graphs, these graphs have to be connected. 

Definition 2.4. Let A, A1 C be finite sets, then is a Markov basis for A if the 
graph J^(M) is connected. Let X be a set of indices, dj G N be natural numbers and 
X) C and Mi C Z'^* be finite sets for any i eX. K sequence (Mi)i(zx is a Markov 
basis for if Mi is a Markov basis for X, for all i eX. A finite set C Z*^ is a 

Markov basis for (Xj)jgi with Xj C Z'^ if (Al)igN is a Markov basis for (Xi)igi. 

In many applications, X C Z'^ is given implicit by Z-linear equations and inequalities 
and thus its complete structure is unfeasible. In algebraic statistics for instance, X 
equals XA,b for a matrix A E Z™^'^ and a right-hand side b E Z™ [10]. Note that our 
general assumption keiz^A) fl = {0} makes Xa,?) finite for all b E NA. We thus call 
a set XT C Z a Markov basis for A if XT is a Markov basis for {iFA,b)bmA- If one can 
efficiently verify whether u G Z'^ is contained in X or not, as for X = iFA,b, then it is 
possible to explore X with the simple walk using XT as follows: At a given node u G X, 
select uniformly an element m E ±XT and walk along the edge given by m G XT, which 
either points to u or to a different node v ^ v + m E if. 

Lemma 2.5. Let be a finite and non-empty set and M <Z Xfi a Markov basis 

for if. The simple walk on X(XT) is irreducible, aperiodic, symmetric, reversible, and 
its stationary distribution is the uniform distribution on X. 

Proof. The random walk is irreducible and symmetric since X(XT) is connected and 
I ± XT [-regular (Remark 2.1). Thus, it suffices to show that X(XT) has one aperiodic 
state to show that all states are aperiodic. Choose u G X and m G XT arbitrarily. 
Since X is finite, let p G M be the largest natural number such that v pm G X. 
Then m cannot be applied on u -|- pm and thus v -|- pm has a loop. The reversibility 
follows immediately and since the transition matrix of the simple walk is symmetric, 
the uniform distribution is the unique stationary distribution. □ 

Remark 2.6. The Metropolis-Hastings-meikodology allows to modify the simple walk 
so that it converges to any given probability distribution on X [20, Section 3]. 
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3. Expanding in fixed dimension 

Let A G and M C kerz(A) be a Markov basis for A. In this section, we study 

the mixing behaviour of (-EA,bi(-Ad))jGN for sequences (6j)iGN in NA that are almost rays 
(Dehnition 3.2). Roughly speaking, our strategy is to show that the sequence of second 
largest eigenvalues (Aj)jgN of this graph sequence satishes Aj > 1 — j for a constant 
C G Q>o and sufficiently many i G N. This leads, together with an assumption on the 
growth of the fiber (Definition 3.17), to a slow mixing result (Theorem 1.1). Our proof 
uses the well-known connection between the second largest eigenvalue modulus of the 
simple walk and the edge-expansion of the underlying graph (Proposition 3.8). Thus, 
our goal is to bound the edge-expansion of (J 74 ,fei(Al))ieN from above appropriately. 
Here, we use a particular property of (fei)jgN; namely that we can translate a smaller 
fiber into a larger fiber u + am ^ ^A,hj ■ If is then left to count the number of Markov 
moves that leave the subset u A ^am i'^ which is done by Lemma 3.11 and 

Ehrhart’s theory. To start with, let us make precise the properties of sequences in EH 
that are crucial in the proof of Theorem 1.1. 

Definition 3.1. Let A G Z™^*^ and let (f)i)jgN be a sequence in EH. The sequence 
(bi)ieN is a ray in EH if there is fe G EH such that (f)j)igN = (i ■ &)iGN- 

We need the following terminology for our next definition: For b G EH, the Q- 
relaxation of J^A,b is the polytope 7lA,b '■= {x G Q>o : Ax = b}. 

Definition 3.2. Let H G A sequence {bi) jgH is dominated if there exists b G EH 

with dim(77.^,fc) > 0 such that bi — i ■ b E EH for all z G E and if there is m G TA,b and 
Wi G J-AM-i-b with supp(t(;j) C supp(M) for all z G E. 

On the one hand, being dominated is a sufficient, though technical, condition on 
(6i)igN that is crucial in our proof of the asymptotic growth of the second largest 
eigenvalue modulus of (J74,bi(Al))igN. The prime example of a dominated sequence the 
reader should have in mind is a ray in the semigroup EH: 

Remark 3.3. The ray (z • 6)igN with b G EH and dim(77.A,b) > 0 is dominated by b. 

Dominated sequences appear, for instance, as subsequence of sequences whose dis¬ 
tance to the facets of EH becomes arbitrarily large. Let HA{b) := min{dist(6, F) : 
F facet of EH}, where dist(6, F) G Q>o denotes the distance between b and F C EH. 

Proposition 3.4. Let A G with non-trivial kernel. Let (&i)igN be a sequence in 

EH with limsupjgj!^ iLA(^i) = oo, then (6i)igN has a dominated subsequence. 

Proof. Let ai,..., G 17^ be the columns of H and let c := ai a^. First, we 

show the following: For every A; G E there exists nik G E, such that any b G EH 
with HA{b) > zrzfc is contained in A: • c -|- EH. The set EH \{k ■ c EH) is contained 
in finitely many hyperplanes parallel to the facets of EH. Hence, choosing ruk G E 
large enough, every b G EH with HA{b) > cannot be in EH \ {k ■ c + EH). The 
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statement of the lemma follows immediately because limsupjgjs^ = oo implies 

that there is 4 ^ N such that HA{bi^.) > Hence, for all /c G N, G /c • c + NH. In 
particular, {bii^)km is dominated by c since J^a,c has an element with full support and 
since dim(7^yi^c) = dim(kerz(H)) >0. □ 


Remark 3.5. The reverse of Proposition 3.4 is not true. For instance, take the matrix 


A 


1 1 0 
0 0 1 


and b = (2,0)^ G Z^. The ray (z ■ b)i^^ is dominated since dim(7^^ b) > 0. However, 
since {z • 6 : z G N} is contained in a facet of NH, H^ii • 6) = 0 for all z G M. 


To put hands on the second largest eigenvalue of the simple walk, we use the following 
connecting piece between statistics and combinatorics. 


Definition 3.6. Let G = (V, E) be a graph and S' C H. Then Eg{S) C E denotes the 
set of all edges with endpoints in S and V \ S. The edge-expansion of G is 

: 5 C y,0 < 2|F| < |I/|| . 

Remark 3.7. The invariant h{G) has many names in the literature, like Cheeger 
constant [4] or isoperimetric number [22]. Also, the conductance <I> of the simple walk 
on a d-regular graph G fulfills $ ■ d = h{G) [26]. 

Proposition 3.8. Let G = (H, E) he a connected and d-regular graph and let A he the 
second largest eigenvalue modulus of , then A > 1 — | ■ h{G). 

Proof. This is [16, Theorem 4.11]. □ 

Example 3.9. For any d G N, let = (1,... , 1) G and let G Z'^ be the fc-th 
unit vector of Z'^, then the set Add '■= {ei — : 2 < /c < d} is a Markov basis for 

Ad- The graph JA2,*(2W2) is isomorphic to the path graph on [z + 1] for any z G N and 
hence its edge-expansion is ^ if z is odd and | when z is even [22, Section 2]. Since 
I ± Ad21 = 2, the second largest eigenvalue modulus Aj of the simple walk on EA2,i(A42) 
satisfies Aj > 1 — 4 by Proposition 3.8. Hence, the sequence (d442,i(Ad2))ieN is neither 
an expander and because of log \EA 2 ,i\ = log(* + 1) nor rapidly mixing. 

The edge-expansion of a graph can be bounded from above by dividing the number 
of edges leaving a hxed subset by the size of this particular subset. For certain subsets 
of hbers, it is possible to give a description of the nodes that are incident to edges 
which leave this set. Intuitively, those nodes lie on the boundary of this set. 

Definition 3.10. Let A G b G NA, and Ai C kerz(A). For u G the u- 

boundary of Ea, b is (Lj^^EAy) := {u G zz -|- EA,b '■ 3m G ±Ai : v m E \ {uEA,b)}- 


h{G) := min 


|44G(g)| 

| 5 | 
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Figure 1. Let be as in Example 3.9. The white points represent the 

nodes of the sets d^XutM 3 i^A 3 , 3 ), and A 3 ,s) 

J-A 3 ,g (from left to right). 


m 


Figure 1 justifies in a way that d'j^{J^A,b) can indeed be regarded as a boundary. 
With this, the number of outgoing edges in a translated hber u + J^A,b within a larger 
fiber can be bounded from above: 


Lemma 3.11. Let A G and b,b' G MA with 2\iFA,b\ < \J^A,b'+b 

finite set A4 C kerz(A) and all u G 


h{J^A,b'+b{,AA.)) < 


2\M\-\dU^A,b)\ 

\d^A,b\ 


Then for any 


Proof. By assumption, u + J^A,b C J^A,b'+b and since 2\u + TA,b 


h{J^A,b'+b{A4)) < 


E^A.b'+b{M){u + J^A,b)\ 

\u -\- J' A,b| 


‘^\d^A,b\ < \d^A,b'+b\i 


The edges leaving the set u+J^A,b in J^A,b'+b{,Ai) C are precisely those with endpoints 
in djlf(J^Ab)- Every node of J^a b'+b(AA) has at most | ± Al| incident edges and hence 
b'+b(-^)('^ + ^A,b)\ is bounded from above by 2\J\A\ ■ \dfii{iFA,b)\- □ 


The size of the entries in a Markov basis is crucial to determine the size of the bound¬ 
ary. The larger those entries are, the more nodes are in the boundary (Lemma 3.13) 
since more nodes in the shifted fiber u -|- iFA,b are adjacent to nodes outside of m -|- iFA,b- 
The next definition should not be mixed up with the Markov complexity [24]. 

Definition 3.12. The complexity of a finite set A1 C is C(M) := maXmeM ||7n||oo- 

Lemma 3.13. Let A G let M C keTz(A) be a finite set, and let b G NA. Then 

for all u G 

C{M) 

d%^{J^A,b) ^U+ IJ IJ {w G J^A,b ■ Wj = r}. 

j'Esupp(ii) r=0 
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Proof. Let v G d'j^{J^A,b)i then there is m G ±A^ such that i; + m G but v + m ^ 
u + J-A,b- Since v E u + J^Afi-, there is tc G J^A,b such that v = u + w. The vector 
w + m must have a negative entry, since otherwise w + m E that is tc + m G J^A,b 
which implies v + m = u + w + m E u + TA,h- Hence, there is f E [h] such that 
{w + m)j < 0. Suppose j ^ supp(tt). Then {u + w + m)j = {w + m)j < 0, which 
contradicts u + w + m = v + mE Thus, j E supp(M) and Wj < —rrij. Since that 
means Wr < C{A4), the statement follows. □ 

Lemma 3.11 allows to measure edge-expansion by essentially comparing the growth 
of hbers with the growth of their boundary. The idea is to show that the boundary 
grows asymptotically slower than the hber itself. Counting the number of integer points 
in a polytope is the subject of Ehrhart theory [12]. Let P C be a polytope and 
consider the map Lp : M —)■ N which counts the integer points in the z-th dilation 
iP := {i ■ X E : X E P}, i.e. 

Lp{i) := |(zP) nZ^|. 

According to Ehrhart’s theorem (cf. [2, Theorem 3.23]), Lp is a quasi-polynomial of 
degree r := dim(P), that is there exist periodic functions co,...,Cr : N —)■ Z with 
integral periods such that 

Lp{t) = ^ Co(t) 

with Cr not identically zero. Here, the dimension of a set is the dimension of its affine 
space. This applies to rays in affine semigroups: Since for any z G N, the integer points 
of 7lA,ib are precisely the elements of iFA,ib, L'^Abi'^) — \^A,ib\ for all z G M and hence 
\J-A,ih\ grows in z (quasi-)polynomial of degree dim(P^f,). 

Remark 3.14. For any integer matrix A and b E NA, dim(P^ b) > dim(p 4 b). In 
particular, if dim(P^;,) > 0, then (|pA,ib|)iGN is unbounded. If A is totally unimodular, 
then TZA,b = conv(P^and hence the dimensions of TZA,b and J^A,b coincide. 

The sets appearing in Lemma 3.13 are not precisely dilates of polytopes and Ehrhart 
theory does not apply directly. Nevertheless, their growth can be bounded as well in 
terms of their dimension. 

Lemma 3.15. Let A E b E Z™, and fix integers j E [d] and I E M. If for all 

i E M>o, Ha, lb is not completely contained in the hyperplane H := {x E : Xj = 

then there is C E N such that the number of integer points in IZA,ib H P zs hounded 
from above by C ■ 

Proof. Write P := IZA,b and r := dim(P). For z large enough, the dimension of (zP)nP 
stabilizes, i.e. there are r',N E N such that r' := dim(zP fl H) for all i > N. The 
affine space of zP fl P is completely contained in P whereas the affine space of iP 
has elements outside of P. That implies r' < r. Let A = [ai,... ,ad) and A' be 
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submatrix of A omitting the j-th column, then the (bijective) projection of iP H H 
onto all coordinates different from j is 

Qi := {x G : Ax = ib — laj}. 

By [32, Proposition 1], there exists finitely many sets Ci,... ,Ck covering M such that 
for i & Cj, the number of integer points in Qi is a quasi-polynomial of degree r'. □ 

Lemma 3.16. Let p{t) = be a quasi-polynomial of degree r > 0 and 

let k & N such that Cr{k) > 0. There exists n G N>o and N E N such that for all 
i E k + n ■ N with i > N, 2p{i) < p{i ni). 

Proof. Let n > 2 such that Cr{i -f- ni) = Cr(i) for alH G N (i.e. if Cr is not a constant, 
let n > 2 be the period of cQ. For all i E k -h n ■ N, Cr(i -1- ni) = Cr(i) = Cr(k) > 0 and 


r—1 


p{i + ni) — 2 ■ p{i) = Crik) ((1 -f nY — 2)P + ^ {csif + m)(l -f- nY — 2cs{i)) P. 


s=0 


The sum in the term on the right-hand side of this equation is a quasi-polynomial of 
degree at most r — 1 and the left term on the right-hand side is a polynomial of degree 
r > 0 whose leading coefficient is positive due to n > 2 and r > 0. Thus, there is 
N e'H such that for alH G A: -|- n • N with i > N, 


r—1 


Cr{k) ((1 -F nY -2)P > - ^ {cs{i + m)(l -F nY - 2cs{k)) P, 


s=0 


that is 2p{i) < p{i + ni). 


□ 


The growth of the state space needs to be compared with the growth of the second 
largest eigenvalue modulus to disprove rapid mixing, as demonstrated in Example 3.9. 
With the following property of a sequences {bi)i^^ in NA, however, a lower bound the 
second largest eigenvalue in terms of the parameter {i)i^^ instead of (|J^A,bil)ieN suffices: 

Definition 3.17. A sequence (6i)jgN in hfA has a meaningful parametrization if there 
exists a polynomial q G Q[t] such that \pA,bi\ < g{P) for all i G N. 

Example 3.18. Consider A 2 from Example 3.9, then \pA 2 ,i\ = i + f. The sequence 
(2*)jep} in MA 2 = N is not meaningfully parametrized. However, it is a subsequence of 
the sequence (z)jgN) which has a meaningful parametrization. 

Proposition 3.19. Let A E and let he a sequence in NA satisfying 

(ll&ilDiGN G C>(A)jgN for some r G N. Then (6j)igN has a meaningful parametrization. 

Proof. Denote by Oi,..., the rows of A. Since kerz(A) fl = {0}, there exist 

coefficients Ai,...,Am G Q such that w := Xllii ^ Q>o- particular, for any 
b E NA and for any u E pA,b, Halloo • minjgjc;] Wi < w^u < m ■ ||A||oo • Halloo- Thus, 
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Hence, if || 6 i|| < C ■ f for all z G N, then {bi)i(z^ has a meaningful parametrization. □ 

For sequences with a meaningful parametrization, it suffices to bound the edge- 
expansion appropriately from above to show a slow mixing behaviour. 

Lemma 3.20. Let A G Ai C k.ei'z^A) he a finite set, and let (fei)iGN he a sequence 

in NH with meaningful parametrization. If there is an infinite subset X C N and C G M 
such that h{AA,bi{Ai)) < 7 for all z G X, then (X'_A,bi(Xd))iGN Is not rapidly mixing. 

Proof. Let A, G [0,1] be the second largest eigenvalue modulus of the simple walk on 
■AA,bi(A4) and assume that the sequence mixes rapidly. Then there exists a polynomial 
p G Q>oM such that for all z G X, 

1 C 

p(loglX'AAl) “ 

where we have used the assumption on the edge-expansion and Proposition 3.8 to 
obtain the lower bound. This implies that for all z G X, 4■p(log \IFA,hi\) > However, 
since the parametrization is meaningful, there exists a polynomial q G Q[t] such that 
\^A,hi\ < (lifi) and thus p(log \IFA,hi\) < p(}ogq{i)) for z sufficiently large, which gives a 
contradiction since X is an inhnite subset and hence unbounded. □ 

We are now ready to prove our main theorem: 

Proof of Theorem 1.1. Since ( 6 j)iGN is dominated, there is 6 G NH with dim(7?.^,fe) > 0 
such that bA.= hi —i-b E for all z G M. Moreover, there is zz G IpAy and a sequence 
(wj)igN in N'^ such that for all z G N, zcj G XA, 6 ' and supp(znj) C supp(zz). Clearly, 
it suffices to show the theorem for the subsequence (&(c(Ai)+i)i)iGN which inherits the 
properties of being dominated and being meaningfully parametrized from ( 6 *) due to 
the linear re-parametrization. Thus, we replace bi with b(^c{M)+i)iy b with {C{Ai) -|-1) ■ 6 , 
and 6 ' with Additionally, we replace Wi with W(c(M)+i)i and u with (C(Ad) -l- 

1) - zz, which does not change the support of zz. After these changes, we have zz* > C{Ai) 
for all z G supp(zz), which is needed later in the proof. The Ehrhart quasi-polynomial 
has degree r := dim(7^^ ;,) and by the dehnition of being dominated, r > 0 . 
Write LTi^fii) = with Cr not identically zero. Since LTi^fii) = \IFA,ih\ > 0, 

there exists fc G N such that Cr{k) > 0. By Lemma 3.16, there exists n G N>o and 
A/ G N such that 21 X ^,461 < \IpA,{i+ni)b\ for all z G {k + n-'N) nN>Ar =: X. By the choice 
of Wi and zz, A ■ {wi+ni + ni ■ u) = -1- rzz • 6 = bi+ni — ib for all z G X and hence 

Wi+ni + ni ■ u + J-A,ib S AA,bi+,,i- In particular, for any z G X 

For any i El, set zzj := Wi+ni + ni ■ u, then Lemma 3.13 gives 

C(M) 

\frA(rAA\ < Y. E = 'll- 

j£supp{ui) 1=0 


(3.1) 
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Since 2\J^A,ib\ < \^A,bi+ni\ Ui G A,b'^^^^+ni■b for all i G X, an application of 
Lemma 3.11 yields the npper bonnd on the edge-expansion of the graph A,bi+„i{-M)'- 


h{:FAMUM)) < 


2\M\-\dZ{:FA,b)\ 


\^A, 


ib 


< 2\M\ 


E 


iGsupp(u) 


Efir 11”' e : WJ = 01 




ib\ 


where eqnation (3.1) and snpp(nj) C snpp(M) was nsed in the first and second inequality 
respectively. For any j G supp(-u) and I G [C{M.)] U {0}, let Hj^i = {x G : Xj = 1} 
be a hyperplane in then for all z G X, the integer points in (z • 7lA,b) H Hj^i are 
precisely Lj^i{i) := |{t(; G J^A,ib '■ Wj = ^}|- Since Uj > C(A4) for all j G supp(zz), the 
vector i-u E 7lA,ib = i ■ is not contained in (z ■ 7lA,b) H Hj^i for all I G [C{Ai)] U {0} 
and all z G X. Lemma 3.15 then implies that for all j G supp(zz) and I G [C{M.)] U {0}, 
there is a constant Cj^i G N such that Lj^i{i) < Cj^i ■ for all z G N. Let C G N be 
the maximum of all Cj/s, then 


^^jGsupp(u) 2^1=0 


h{:FA,b.UM))<2\M\- 


= 2\M\ 


L 


T^A.b ' 


supp(zz)| ■ {C{M) + 1) ■ C ■ i 
Cr{k)i'' + El=l Cs{i)i'' 


r—1 


Since |A1|, C{Ai),n, C, and supp(zi) are constants which are independent of z G X and 
since Cr{k) > 0, {h{TA,bi+nii-^)))i&i ^ ^ follows from Proposition 3.8 

that the sequence is no expander and if (5i)ieN has a meaningful parametrization, then 
Lemma 3.20 implies that the sequence cannot mix rapidly. □ 


Remark 3.21. Basically, Theorem 1.1 shows the existence of C,C' G N>i such that 
the second largest eigenvalue modulus Aj of the simple walk on J^A,bi{-M.) satishes: 
Aj > 1 — ^ for all z G C" ■ N. Here, the constant C is due to the many boundary effects 
and the fluctuations in Ehrhart quasi-polynomials. For instance, when (6j)jgN is a ray 
instead of a dominated sequence and when A is totally unimodular, then Aj > 1 — y 
holds for all z G N>c" for a constant C" G N. Also, when (5j)igN is a sequence with 
limsupjgj^ = oo, then limsup^gj^ A* = 1 due to Theorem 1.1 and Proposition 3.4. 

Corollary 3.22. Let A G and let Ai C kerz(A) be a Markov basis for A. Let 

{bi)i£N from NA have a meaningful parametrization and suppose there is p E Q[f] with 
p(M) C N such that (fep(j))ieN is dominated. Then (XA,6i(Al))j6N is not rapidly mixing. 

Proof. Clearly, there exists C G M>o such that p{C- (z-1-1)) > p{C-i) for all i sufficiently 
large. Let h[ := bp(^c-i), then (6')jeN is a subsequence of (5i)ieN and hence it suffices to 
show that (XA,fe'(.M))jgN is not rapidly mixing. Since |XA,fe'| = \ J^A,bp^c.i) I < (l{p{C -i)) 
for a polynomial q E Q[t], has a meaningful parametrization. By assumption, 

there exists b E NA such that ftpp) -i-bE NA for all z G N and hence b[ — i-C-bE NA 
for all z G N. Theorem 1.1 then implies that (XA,b'(A4))jgN is not rapidly mixing. □ 







12 


TOBIAS WINDISCH 


Corollary 3.23. Let A G Ai C kerz(v4) a Markov basis for A, and b G NA 

with dim(7^/i^;,) > 0. Suppose that (fej)jgN has {p{k) ■ b)k£N o^s subsequence for some 
non-constant p G Q[t]. Then {iFA,bi{A4))im is not rapidly mixing. 

Proof. Let C, G N such that p{C ■ k) > k for k > N. Then (p(C ■ k) — k) ■ b E NA 
for A; > iV and hence {p{C ■ k) ■ is dominated. Clearly, {piC ■ k) ■ is 

meaningfully parametrized because of Proposition 3.19 and hence the statement is a 
consequence of Theorem 1.1. □ 


Remark 3.24. Let M. = {mi,..., m^} C kerz(A) be a Markov basis for A. Extending 
M. by adding a hnite number of Z-linear combinations may improve the 

mixing behaviour in one particular hber, but since the complexity of the new set of 
moves is still hnite, this cannot lead to rapid mixing asymptotically due to Theorem 1.1. 
For instance, this implies that the Graver basis of A has the same asymptotic mixing 
behaviour than any other hnite Markov basis for A. 


Example 3.25. Let A^im be the constraint matrix of the nxm-independence model [10]. 
Elements in the kernel of An\m can be written as n x m contingency tables whose row 
and column sums are zero and the basic moves Ain\m ai'c a minimal Markov basis for 
An\m- These are all moves in the orbit of 


■ 1 -10 
-1 1 0 

0 0 0 


O' 

0 


00 0 


under the group action of Sn x Sm on the rows and columns. Using this Markov basis 
to explore the set of contingency tables was suggested in [9]. Elements in the same 
hber of An\m have the same || ■ ||i-norm, namely |||6||i, since the row-space of An\m 
contains the vector (1,..., 1) G Observe that the invariant ^||6||i is precisely 

the sample size of goodness-of-ht tests for the independence model [10, Chapter 1.1]. 
Thus, we obtain sequences with a meaningful parametrization whenever the 

sample size grows polynomial in i by Proposition 3.19. Assume that n > m and that 
> f O • (1,..., l)"^ for hxed s, f G N, then bi.t.n — i ■ s ■ {n,... ,n,m,..., m)^ G 
(where n,... ,n denotes the m column sums and m,... ,m denotes the n row sums) and 
it follows that {bi.t.n)i£N is dominated since the hber of (n,..., n, m ..., m)^ contains 
an element with full support. Corollary 3.22 shows that the simple hber walk on 
is not rapidly mixing. These assumptions hold for instance when 
n = m and 6* := (i,... ,i) G even though the node-connectivity under the basic 
moves M.n\n is best-possible due to [23, Theorem 2.9]. 


Using Markov chain comparison methods as in [11], Theorem 1.1 can be used to 
show that related random walks on hbers are not rapidly mixing as well. We show that 
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the random walk on a fiber where we sample uniformly from the full Markov basis has 
asymptotically the same mixing behaviour than the random walk which samples from 
the set of applicable moves locally. 

Proposition 3.26. Let G = (V, E) be a connected d-regular graph and let G' be the 
graph obtained from G after removing all its loops. Let X and X' be the second largest 
eigenvalues of and respectively, then (1 — A') < d ■ (1 — A). 

Proof. Let m be the number of edges and 6 be the minimal degree in G' respectively. 
If G' is bipartite, then A' = 1 and the claim holds. Assume differently, the stationary 
distribution vr of is the uniform distribution on V whereas the stationary distribu¬ 
tion of S^' is tt' : P —)■ [0,1], n'{u) = degg/(n) ■ (2m)“L We use [31, Lemma 2.5]. For 
any u E V, 

^ |P| ■degG,(n) ^ |P| -d 
n{u) 2m ~ 2m 

and for any distinct w,v G V, 

^ |P| ■ d 

2m 

Since the diagonal entries of S^' are zero, [31, Lemma 2.5] implies (1 — A') < d ■ ■ 

(1 — A). Since G is connected, G' has no isolated nodes and hence <1. □ 

Remark 3.27. Fix a constraint matrix A G and Markov basis Ad of A and 

consider the random walk S' on EA,b{Ai) which samples for any v G uniformly 
from the set of all applicable moves {m G ±Ad : n -|- m G N'^} to explore the fiber. 
This modified random walk is precisely the simple walk on the graph obtained from 
J'Ayi-M) after removing all its loops. In particular, this random walk has no rejections. 
However, Proposition 3.26 implies that whenever (dM,bi(Ad))ieN is not rapidly mixing, 
this modified random walk is not rapidly mixing as well. 

4. Constructing expander graphs on fibers 

The message from the previous section is that the moves in a Markov bases do not 
suffice to provide a good mixing behaviour asymptotically. A possible way out is to 
adapt the Markov basis appropriately so that its complexity grows with the size of 
the right-hand entries. This can be achieved by adding a varying number of Z-linear 
combinations of the moves in a way that the edge-expansion of the resulting graph 
can be controlled. However, a growth of the set of allowed moves comes along with an 
increase of the number of loops, i.e. an increase of the rejection rate of the walk. Let 
A G Z™^*^ be a matrix, Ai = {mi,... ,mk} C keT^iA) be a Markov basis for A, and 
b G NA. For I G N, let 

r k k 

■^i^) ~ "I : Ai,. . ., Afc G Z, |Ajj < / > 

I i=i i=i J 
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and define := diam and A4^ := A4(d^f^). Using A4^ instead of A4 as 

a set of allowed moves, the corresponding fiber graph is the complete graph 

on J-A,b- We discnss in Remark 4.4 how moves from can be sampled nniformly. 
The transition matrix of the simple walk on J^A,b{-M^) is 


"1 1 .. 

. 1 1' 


'\M’’\-\PA,b\ 0 .. 

. 0 0 

1 

1 

1 

0 

0 








lAd^’l 



1 

1 


0 

0 

11.. 

. 1 1 


0 0 .. 

. 0 |AP| - \PA,b\_ 


In particnlar, its second largest eigenvalne modulns is 1 — and hence the next 
proposition is immediate. 

Proposition 4.1. Let A G M. C 'keiz{A) be a Markov basis for A and 

a sequence in NA. Suppose there exists r G N such that (|d^Ab D^gn ^ and 

{\M ^*|)ieN e C>(U)iGN, then ^s an expander. 

To make nse of Proposition 4.1, the growths of the hbers and the adapted Markov 
bases have to be compared. Again, Ehrhart’s theory applies to compnte the growth of 
certain hber seqnences. The asymptotic growth of depends on the growth of the 
diameter of J-A,bii-^)- Hence, we hrst want to nnderstand how the nnmber of elements 
in Ai{l) grows as a fnnction of / G N. 

Lemma 4.2. Let Ad = {mi,..., mfc} C then (|A1(/)|)zgn ^ 

Proof. We identify the hnite set M with the integer matrix (mi,..., mk) G 
Denote the h-dimensional cross-polytope by P := (x G : ||x||i < 1} and let 
P' := {M ■ X : X G P} be its image in Q'^ nnder Ad. With this, we can write 
Ad(/) = (Ad ■ X : X G (/ ■ P) n Z^} and hence Ad(/) C (I ■ V') fi Z'^. Since P' is a 
polytope, Ehrhart’s theorem [2, Theorem 3.23] gives |(/ ■ P') fl Z'^\ < C ■ for 

some C G Q>o and since dim(P') = rank(Ad), the claim follows. □ 

Corollary 4.3. Let A G and let Ad C kerz(A) be a Markov basis for A. Let 

(bi)i&N be a sequence in NA such that (|pA,bil)iGN ^ Qfid G 

(A(t)jgi^. Then (pA,fei(Ad^*))jgN is an expander. 

Proof. Let r := dim(kerg(A)). It snfiices to show that |Ad^*| < C ■ A for a constant 
C G Q>o since the statement follows then from Proposition 4.1. Since Ad is a Markov 
basis for A, rank(Ad) = r and thus Lemma 4.2 implies that |Ad(/)| < Ci ■ P for a 
constant Ci G Q>o- The assumption implies that there exists C 2 G Q>o such that 
dXb, <C 2 -i for all i G N. Then, |Ad'’^| = |Ad(d^6j| < |Ad(C '2 ■ i)\ < Ci ■ ■ P. □ 
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Expanders are not per se fast, and Corollary 4.3 is an asymptotic statement. That 
means, for a given matrix A G a given Markov basis M C keTz(A), and a 

right-hand side b G NA, we know by Theorem 1.1 that the second largest eigenvalue 
modulus of the simple walk that uses Ai can be arbitrarily close to 1. On the other 
hand, since G by [27], the second largest eigenvalue modulus of the 

simple walk that uses the adapted Markov basis can be bounded away from 1 
strictly. Thus, there exists a threshold zq G N such that the adapted Markov basis is 
faster than the conventional Markov basis on AA,i-b for i > io- The exact value of io 
depends on the hidden constants in the asymptotic formulations of Corollary 4.3 and 
can be quite small, as in Figure 2, but also very large so that the advantages of the 
adapted Markov bases may pay off only for large right-hand sides. 

Remark 4.4. Running the simple walk on AA,b{Ai{l)) for some I G N requires to 
sample from Ai{l) uniformly and hence a good understanding of this set is necessary. 
Basically, we shift the problem of sampling from J74,b for all b G NA where AA,b{Ai) 
has diameter I to the problem of sampling from Ai{l), which can be seen as some 
kind of rejection sampling from a larger set u + Ai{l) ^ iFA,b- For large hbers, one 
applicable move m G Ai{l) suffices to obtain a sample u + m G TA,b that is very 
close to uniform. Write A4 = {rui,... ,mk} and r := rank(A4). When r = k, then 
an element A picked uniformly from {u G : ||m||i < 1} gives rise to an element 
A4 ■ A that is uniformly generated from A4[l). This is not the case when r > k. One 
approach to sample from Ai[l) uniformly in this case is to hrst compute a lattice 
basis B := {6i ,... ,br} C Z'^ of Al ■ Z^ in order to get rid of relations among the 
moves from A4. Then, we compute for every i G [k] coefficients A]^,..., A* such that 
nii = For C := ^^^^maxjejq |A*|, we have M.{1) C B{C ■ 1). Thus, after 

sampling coefficients A from {n G Z'’ : ||m||i < C -1} uniformly, we obtain a move B ■ A 
that is sampled uniformly from a superset of Ai{l). Since \B{C ■ l)\ grows as 
Proposition 4.1 remains valid. Sampling from the cross-polytope {u G Z'' : ||n||i < C-l} 
can be done with the heat-bath method as studied in [27], which is fast for I —)■ oo. 

Example 4.5. The constraint matrix An\n of the independence model (Example 3.25) 
is totally unimodular and hence = dim(J74„|„,i„) where G Z”+"' is the 

vector with all entries equal to 1. It was shown in [23, Proposition 2.10] that the diam¬ 
eter of ■ bn) is {n — 1)7 In particular, for hxed n G N, the diameter grows 

linearly in i and hence Corollary 4.3 yields that the sequence is 

an expander. 

Example 4.6. Assume d > 2 and consider A^ and Aid from Example 3.9. It is not 
hard to see that the graph-distance between any two nodes u, n G is at most 

||m — n||i. Since the maximal || ■ ||i-distance of two elements in AA^,i is 2i, the diameter 
of AAa,i(Aid) is 27 Hence, (J74^,i(Al[j))ieN is an expander. 
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Figure 2. The SLEM of the simple walk on using moves from the 
conventional Markov basis and the adapted moves A43{2i). 

Example 4.7. For k eN, let be the identity matrix in 1^ be the fc-dimensional 
vector with all entries equal to 1 and dehne the matrix 


(4.1) 

Hk := 

'Ik 

0 

4 

0 

0 

4 

0 

4 

—Ik 

0 

0 ■ 

—Ik 

g ^(2fc+l)x(4fc+2) 



0 

0 

0 

0 

1 

1 



It was shown in [15, Theorem 2] that the reduced lexicographic Grobner basis Qk of 
Hk is {e, — Ck+i : i G {1,..., fc, 2A: + 1,..., Sk}} together with the move 

( 0 ,..., 0 , 1 ,..., 1 , 0 ,..., 0 ,- 1 ,...,- 1 , 1 ,-!)^. 

With [15, Section 4], it is easy to show that for any k eN, the diameter of 
is {2k + l)i. Thus, {J^Hk,ie2k+AGk{{‘2k + l)*)))*^^ is is an expander. 

5. Scaling the dimension 

Markov bases of constraint matrices coming from statistical problems are often 
parametrized and they can be stated explicitly for any parameter. For instance, the 
basic moves A4.n\n of the independence model (Example 3.25) form a Markov basis for 
An\n for every n G M. Thus, varying the parameter n provides hber graphs where the 
set of moves is adapted canonically. 

Remark 5.1. Let bn := (1, ..., 1) G then the elements of can be identihed 

with the elements of the symmetric group Sn on [n]. Finding a set of generators such 
that the corresponding Cayley graph on Sn is an expander is an active research held 
in group theory, see for instance [19]. In [8], it was shown that the simple walk on the 
Cayley graph of Sn that uses the transpositions mixes rapidly in |nlogn many steps. 
Inspired by shuffling a deck of n cards, a random walk on Sn that uses riffle shuffles 
was studied in [1] and shown to be rapidly mixing as well. 
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Parametric descriptions of Markov bases can be arbitrarily complicated in general, 
since by the Universality theorem [5], any integer vector appears as a snbvector of a 
Markov basis element of the three-way no interaction model, when the parameters are 
large enongh. Different than in hxed dimension, where the Markov basis is hxed, the 
size of the Markov basis is important in the convergence analysis when the dimension 
varies becanse the local sampling process of a move can be compntationally challenging 
as the Markov basis becomes larger. The trade-off between an easily accessible set of 
moves and a corresponding random walk that has good mixing properties shows the 
realms of fiber walks in practice. The next proposition illnstrates this for Hk from 
Example 4.7, where the overwhelming nnmber of moves in its parametric Graver basis 
Grfc slows the chain down for fc —)■ oo, despite the fact that the edge-connectivity of 
these fibers is best-possible [15, Theorem 4]. A description of Gr^ is in [15, Theorem 2]. 

Proposition 5.2. The sequence {J^Hk,e 2 k+ii^'’^k))km is not rapidly mixing. 

Proof. According to [15, Section 4], J^Hk,e 2 k+A^^k) is isomorphic to the graph on the 
nodes { 0 , in which two nodes (A, ■ ■ ■, A+i) and (A, • • •, jfc+i) are adjacent if either 
4+1 = jfc+i and ||z -illoo = 1, or if A+i 7 ^ jk+i- For any k G N>o, let Sk := {(0,z,0) : 
i e {0,1}^"^} U {(0,z, 1) : i e {0,1}^“^}, then IS'*,] = l\J^Ak,e 2 k+i\- Gonnting the 
edges leaving Sk, for any (0,z,0) G Sk there are k many with endpoints in {(l,z,0) : 
i G {0,1}^“^} and with endpoints in {(l,z,l) : i G {0,1}*^“^}. The same is 
trne for any (0,z,l) G Sk. Hence, there are {k + 2^~^) ■ 2 ■ 2^~^ edges leaving Sk. 
The edge-expansion of .F+j,, 62^+1 (Gr^) is thns bonnded from above hj k + 2^~^. Since 
|Grfc| = 2 ■ + Ak) and log = ^ + 1 , the claim follows. □ 
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